Avoiding the Comparative Fallacy in the Annotation of Learner Corpora
نویسندگان
چکیده
It is becoming more common to use corpora of second language learner data in order to support research on various second language acquisition (SLA) topics (e.g., Römer, 2009; Wulff et al., 2009), but there has been little use of corpus annotation. For many questions in SLA research, using a corpus is simple and in no need of annotation: one can search a corpus for specific words to find relevant examples. For example, if one wants to examine how modal verbs are used by L2 learners (cf., e.g., Aijmer, 2002), one can search for those specific lexical items (can, should, etc.) and analyze the output by hand. Consider a search for syntactic patterns, however, such as examining wh movement (e.g., Juffs, 2005; Wolfe-Quintero, 1992; Schachter, 1989). These types of questions require more linguistic abstraction (cf., e.g., Lüdeling, 2010). If we take the learner sentence (1), for example, what kind of search involving specific words addresses questions about the function of whom?1 If we search for all instances of whom in a corpus, we still have to determine whether this is a relative clause marker, whether this is subject or object extraction, or what the depth of embedding is; and then we need to do the same for that, which, or even other prepositional objects. We need the data marked with syntactic annotation.
منابع مشابه
Hedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners
Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...
متن کاملPhrase Structure Annotation and Parsing for Learner English
There has been almost no work on phrase structure annotation and parsing specially designed for learner English despite the fact that they are useful for representing the structural characteristics of learner English. To address this problem, in this paper, we first propose a phrase structure annotation scheme for learner English and annotate two different learner corpora using it. Second, we s...
متن کاملA Comparative Analysis of Lexical Bundles in Journalistic Writing in English and Persian: A Contrastive Linguistic Perspective
This paper investigates the use of ‘lexical bundles’ in two broad corpora of journalistic writing. The aim of this study is to compare the use of lexical bundles in the two domains, one consisted of newspaper articles written in English and published in England and the other one comprised of newspaper articles written in Persian from Iranian publications. For this purpose, the frequency...
متن کاملA Comparative Analysis of Lexical Bundles in Journalistic Writing in English and Persian: A Contrastive Linguistic Perspective
This paper investigates the use of ‘lexical bundles’ in two broad corpora of journalistic writing. The aim of this study is to compare the use of lexical bundles in the two domains, one consisted of newspaper articles written in English and published in England and the other one comprised of newspaper articles written in Persian from Iranian publications. For this purpose, the frequency...
متن کاملError Annotation for Corpus of Japanese Learner English
In this paper, we discuss how error annotation for learner corpora should be done by explaining the state of the art of error tagging schemes in learner corpus research. Several learner corpora, including the NICT JLE (Japanese Learner English) Corpus that we have compiled are annotated with error tagsets designed by categorizing “likely” errors implied from the existing canonical grammar rules...
متن کامل